首页> 外文OA文献 >Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model

【2h】

Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model

机译：半有界网络带状算法的有限时间后悔支持模型

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper we consider stochastic multiarmed bandit problems. Recently apolicy, DMED, is proposed and proved to achieve the asymptotic bound for themodel that each reward distribution is supported in a known bounded interval,e.g. [0,1]. However, the derived regret bound is described in an asymptoticform and the performance in finite time has been unknown. We inspect thispolicy and derive a finite-time regret bound by refining large deviationprobabilities to a simple finite form. Further, this observation reveals thatthe assumption on the lower-boundedness of the support is not essential and canbe replaced with a weaker one, the existence of the moment generating function.

机译：在本文中，我们考虑了随机多臂强盗问题。最近提出了策略DMED，并证明该策略可实现以下模型的渐近界线：在已知的有界区间内支持每个奖励分布，例如[0,1]。但是，派生的后悔界限以渐近形式描述，并且在有限时间内的性能未知。我们检查该策略，并通过将大偏差概率提炼为简单的有限形式来导出有限时间后悔。此外，这一发现表明，关于支撑的下界的假设不是必须的，可以用一个较弱的假设代替，即力矩产生函数的存在。

著录项

作者
Honda, Junya; Takemura, Akimichi;
展开▼
作者单位

展开▼
年度 2012
总页数
原文格式 PDF
正文语种 {"code":"en","name":"English","id":9}
中图分类

相似文献

外文文献
中文文献
专利

1. Non-Asymptotic Analysis of a New Bandit Algorithm for Semi-Bounded Rewards [J] . Junya Honda, Akimichi Takemura Journal of machine learning research . 2015,第Apr期

机译：一种新的半有界奖励的强盗算法的非渐近分析
2. Regret bounds for Narendra-Shapiro bandit algorithms [J] . Gadat Sebastien, Panloup Fabien, Saadane Sofiane Stochastics: An International Journal of Probability and Stochastic Processes . 2018,第5a8期

机译：Narendra-Shapiro Bandit算法的遗憾界限
3. Bandits with Budgets: Regret Lower Bounds and Optimal Algorithms [J] . Richard Combes, Chong Jiang, Rayadurgam Srikant Performance evaluation review . 2015,第1期

机译：有预算的土匪：遗憾的下界和最佳算法
4. Finite-time Regret Bounds for the Multiarmed Bandit Problem [C] . Nicolo Cesa-Bianchi, Paul Fischer Machine learning . 1998

机译：多臂强盗问题的有限时间后悔界限
5. From Stability to Low-Regret Algorithms in Stochastic Multi-Armed Bandits [D] . Huang, Kuan-Sung. 2021

机译：从随机多武装匪中的低遗憾算法到低遗憾算法
6. Algorithmic bias amplifies opinion fragmentation and polarization: A bounded confidence model [O] . Alina Sîrbu, Dino Pedreschi, Fosca Giannotti, -1

机译：算法偏差会加剧意见分歧和两极分化：有界置信度模型
7. Regret bounds for Narendra-Shapiro bandit algorithms [O] . Gadat, Sébastien, Panloup, Fabien, Saadane, Sofiane 2016

机译：遗憾的是Narendra-shapiro强盗算法

Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅